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Abstract. Recently, 2-protected nodes were studied in the context of ordered trees 
and fc-trees. These nodes have a distance of at least 2 to each leaf. Here, we study 
digital search trees, which are binary trees, but with a different probability distri- 
bution underlying. Our result says, that grosso modo some 31% of the nodes are 
2-protected. Methods include exponential generating functions, contour integration, 
and some elements from (/-analysis. 



1. Introduction 

Cheon and Shapiro [2] started the study of 2-protected nodes in trees. A node enjoys 
this property if its distance to any leaf is at least 2. A simpler notion is 1-protected: 
exactly the nodes that are not leaves are 1-protected. In the cited paper, the family 
of ordered trees was considered, and it was found that asymptotically a proportion of 
| of the nodes is 2-protected. Recently, Mansour [9] complemented these results by 
studying fc-ary trees. 

In the present note, we study the analogous quantity for Digital Search Trees (DSTs), 
a structure that is important in Computer Science [7] . As trees, they are binary trees, 
but the (probability) distribution is quite different. From a mathematical point of view, 
they always lead to interesting and nontrivial considerations, with a flair of g-analysis. 
Here are a few papers of relevance: [3j [101 El El E] 

DSTs are constructed as follows. Given a sequence of binary strings, we place the first 
in the root node; those starting with "0" ("1") are directed to the left (right) subtree of 
the root, and are constructed recursively by the same procedure but with the removal 
of their first bits when comparisons are made. See Figured] for an illustration. 

In the following section we will show that the proportion of 2-protected nodes in the 
DST model is about 31%; a more detailed statement will be given later. 

We collect here are few notations. These quantities belong to the realm of g-series 
and can be found in [1] , although with a slightly different notation: 

m . oo . oo 

^=n( 1 -^)' ^=n( i -^) ) ^)=n( i -j)- 

k=l k=l k=l 

There is a formula that is equivalent to one of Euler's partition identities: 
Q(t) = a m+1 t m with a m+ i = — . 

m>0 ^ m 
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Finally, we will use L = log 2. 

2. Average number of 2-protected nodes 

Denote by l n the average number of 2-protected nodes in a random DST, built from 

n data. By random we mean that whenever a decision has to be made whether to go 

down to the left or right, a fair coin is tossed, and a direction is chosen with probability 
l 

2" 

The following recursion follows from the observation that, provided we have n + 1 
data, k go to the left and n — k go to the right, and such a split happens with probability 
(")2 _n . One node goes to the root and is always 2-protected except in the instances 
k = l or k = n — 1. Therefore 

k=0 ^ 7 k=l or n-l ^ ' 

n / \ 
1-n I n \i „ol-n 



= 1 + 2 Lu 1 *-" 2 

fc=0 ^ 7 

This recursion is true for n > 3, with initial conditions io = ^i = ^2 — 0> ^3 — §■ 
Our treatment follows j3]. We introduce the exponential generating function L(z) = 
^ n>0 / n 2; n /n! and translate the recursion: 

n>3 n>3 n>3 fc=0 V 7 n>3 

or 

E u. ^ - ^ = E ^ + E ^ E I - E 

n>0 n>3 n>0 fc=0 v 7 ra>3 

which leads after some simple manipulations to 

L\z)=e z -ze*' 2 -l+ Z ^ + 2e z ' 2 L{ Z -). 



ON PROTECTED NODES IN DIGITAL SEARCH TREES 3 

Now we introduce the Poisson generating function M(z) = e~ z L(z) = ^2 n>0 m n z n /n\ 
and rewrite the equation: 

2 

M'{z) + M(z) = 1 - ze- z/2 - e~ z + ^-e~ 2 + 2M(-). 
For n > 1, we can read off the coefficients of z n /n\: 

m n+1 = -(1 - 2 1 -")m n + n{-l) n 2 l - n - (-If + " (n ~ ^ (-1)" 
In order to solve it, we rewrite it as 

m n+1 (-ir _ m^-l)*" 1 + - 1 + 2^ 



Qn-l Qn-2 Qn-l 

which can be summed and leads to 

m N+ i(-l) N _y n2^-l + ^ 
Qn-1 ~^ Qn-l 

and eventually to 

m N = Qjv_ 2 (-1) 2^ 

n=l 

Since 



Qn 



fc=2 



we found the following explicit formula that we formulate as a theorem. 

Theorem 1. The average number of 2-protected nodes in random DSTs of size N > 1 
is exactly given by 

1 ,Mt„ Kl-(n + l)2" 



'» = E(?)<-i)*fl«X: 

Z O \ / »i — 1 



fc=2 n=l ® n 

Now we turn to the asymptotic evaluation of In as N — > oo. Again, we follow the 
approach in [3] and use Rice's integrals, which means that we are able to rewrite In as a 
contour integral. Changing the contour of integration and collecting residues produces 
the asymptotic expansion of interest. Many examples have been described in 0j. In 
order to do so, one must extend the function 

^ 1 - (n + 1)2""' - 

Qk - 2 \ Qn 
n=l 

so that it makes sense for any complex k, not just integers. This will be discussed now. 

We have Qk-2 = Qoo/Q(2 2 ~ k ), and this makes sense for any k. Now we have, using 
Euler's identity mentioned in the Introduction, 

1 Q(2- n ) 1 



oo 



Qn Qoo Q m ■■-■■(■) 
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and this makes sense for any n, since the smallness of the a m 's handles all convergence 
issues. Therefore 

n=l Vn Vo ° m>0 n=l 

The inner sum (on n) can be explicitly evaluated, but since it is long and ugly, we 
don't display it here. The resulting form (that we keep in our Maple calculation) can 
be used for any k G C. 
The integral expression is 

1 f T(N + l)T(-z) 
lN = -^Jc W + l-z) 

where C encircles the poles 2, 3, . . . , N and no others. The function ip(z) is the extension 
of 

^ 1 _ (n + 1)2-" - M^tll 

Qk-2 ^ 

n=l ^ n 

as just discussed. Changing the contour, one encounters other poles. They must be 
subtracted and produce the asymptotic expansion that we need. The main contribution 
comes from z — 1. There are also poles at z = 1 + Xk, with \k — ^j^, and they 
contribute a tiny oscillating function iV • 5(log 2 N), where the amplitude of S(x) is 
typically smaller than 1CT 5 . In order to keep this note short and crisp, we refrain 
from computing this function explicitly. It is not difficult, and there are many similar 
examples in the literature. So we concentrate now on z — 1, and we will find a simple 
pole. As a first step, we consider 

Qk-i L / , lAo _„ n(n + l) 



lim -^- k y\l-(n+ 1)2- 



n=l 

This limit can be computed by Maple, with the result 

1 B{2~ m ) 



AL (2~ m - l) 3 (2~ m - 2) 



2 



and B(x) := 16L - 48xL + A8x 2 L - 16x 3 L - 20 + 60a; - 69a; 2 + 36a; 3 - 7a; 4 - 8a; log (x) + 
12x 2 log (x) — 10a; 3 log (x) + 4a; 4 log (x). Note that bo is interpreted as a limit: 

So we are left with the negative residue of 

T(N + l)T(-z) 



r(N + i-z) 

at z — 1, which is just N. Summarizing, we found the asymptotic behaviour. 
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Theorem 2. The average number In of 2-protected nodes in random DSTs of size N 
admits the asymptotic expansion 

l N = N- — J2 a m+ ibm + N ■ <5(log 2 N) + 0(1), 

m>0 

where the numerical constant evaluates to 0.30707981393605921828549 .... The tiny 
periodic function S(x) has a Fourier expansion that could be computed in principle. The 
remainder term 0(1) stems from the next pole at z = 0. 

One referee has suggested to give the explicit expression of the periodic function S(x) 
without proof. Here it is: 

ln2 m 



Qoo ^^ m+1 2L2(2- - l) 2 (2-+i - 1) 

x [iL(7 - 15 ■ 2 m + 10 • 4 m ) - 27r/(2 m+1 - 1) 
For example, W500 = 0.305710 .... 

Remark. Flajolet and Sedgewick in [3] solved an open problem of Knuth [7], and 
considered the number of endnodes. They found this to be on average as (3 ■ N, 
with f3 = 0.372046812 .... Again, there are tiny oscillations. The quantity (1 — j3)N 
is (asymptotically) the number of 1-protected nodes. So, there are roughly 63% 1- 
protected nodes, and our new results say that there are about 31% 2-protected nodes. 
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